Correlated patterns in non-monotonic graded-response perceptrons 
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The optimal capacity of graded-response perceptrons storing biased and spatially correlated patterns 
with non-monotonic input-output relations is studied. It is shown that only the structure of the 
output patterns is important for the overall performance of the perceptrons. 



I. INTRODUCTION 

Graded-response perceptrons have been studied inten- 
sively in the past years (jjj and references therein). In 
particular, it is found that for non-monotonic input- 
output relations interesting retrieval properties are ob- 
tained such as an improvement of the optimal capacity 
(see, e.g., [|). 

The studies mentioned above concern patterns that are 
chosen to be independent identically distributed random 
variables with respect to the sites and the patterns. How- 
ever, in practical applications one has to consider sets of 
data with internal structure. While the effects of bias 
and correlations on the optimal capacity have been stud- 
ied before for monotonic input-output relations (f3|-|| 
and references therein) it is not yet reported on for non- 
monotonic ones. This is the purpose of this brief report. 



11 we write down the final results of a Gard 
pi of the capacity problem. In section III 
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In section 
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study these results numerically for some specific input- 
output relations. The influence of spatial input correla- 
tions on the optimal capacity is determined as a function 
of the correlation strength. Concerning bias, both the 
effect of input and output bias are analysed. Some con- 
cluding remarks are presented in section [Tv| . 



II. REPLICA ANALYSIS 

The graded-response perceptron maps a collection of 
input patterns {£f ; 1 < i < N}, 1 < /j < p — aN, with a 
the capacity, onto a corresponding set of outputs £ M via 



(1) 



Here g is the input-output relation of the perceptron. In 
(|I|) is the local field generated by the inputs. The 
Jj are the couplings of the perceptron architecture. We 
focus our attention on general input patterns specified by 



(2) 



The matrix C formed by the elements Cy is taken to 
be symmetric and positive. In the sequel we specifically 
consider correlations with m — and general C and cor- 
relations with m ^ and CV, = 5ij(v + m 2 ). The latter 



will be called biased patterns and v is the variance of the 
input distribution. 

We allow for a limited output precision in the mapping 
(Q). In other words the output that results when the 
input layer is in the state {£f } is accepted if 



g(h» ± k) e WC, e) = IC - e, C + e] 



(3) 



where e denotes the allowed output-error tolerance and 
k the required input stability. In order to compute the 
available Gardner-volume Q in J-space satisfying (|^) 
we rewrite the latter as a condition on the local fields 



ft" eI" = {x;g(x±K) eJout(CV)} 



where, in general 



(4) 



(5) 



form a collection of intervals, not necessarily simply con- 
nected, with If , u^ 1 the lower and upper bounds of the 
j-th subinterval and r M the number of subintervals de- 
fined by the pattern C M - We remark that for monotonic 
input-output relations, r M = 1. Following the standard 
Gardner analysis M we use the replica technique to calcu- 
late v = liniAr^oo TV -1 ((lnl^)) with V the fractional vol- 
ume in J-space with spherical normalization and ((• ■ ■)) 
the average over the statistics of inputs and outputs. 

The order parameters occuring in_this calculation for 
correlated patterns with m = are 



(6) 



Q^ = ^T, C n'4 J h A,A' = l,...,n (7) 



with n the number of replicas. Since the set of gen- 
eral fixed-point equations leading to the optimal capac- 
ity a c (obtained when V = 0) in the replica-symmetric 
(RS) approximation has been discussed already in [|]||] 
(for a simple perceptron with the sign-function as input- 
output relation) we do not write out explicitly the anal- 
ogous formula for the graded-response perceptron with 
correlated input and binary output but non-monotonic 
input-output relations. For technical reasons the latter 
are taken to be odd in the field. We just mention that the 
essential difference is a splitting of the integrations in re- 
gions of the form + lj)/2, lj] and [uj, (uj +/ J+ i)/2] 
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corresponding to the collection of intervals (JsJ) (compare 
eq. (8) in No closed form for a c is possible and the 
solution of these fixed-point equations is rather tedious. 
In the next section we present some numerical results for 
exponentially decaying spatial correlations. 

For biased patterns (m ^ and Qj = <%(v + TO 2 )) the 
order parameters read 



M) 



(8) 
(9) 



Since in this case a closed form for a c is possible and 
its structure is interesting for analysing the effects of the 
bias m we write it down explicitly in a first-step replica- 
symmetry breaking approximation (RSB1). Applying 
the Parisi scheme MM we find 



-ln(l+P(l- go ))- 
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with, dropping the index /i in the sequel 



+ C \--[B( Uj ) +B{l j+1 ],-B{u j ),y{u j ) 
+ £(-%),-B(! j ) ) 0) 
where mq = — oo, Z r+1 = +oo, 
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J a 



, 1 , 
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y(x) = A(x) + z Qv ^ + z lv a - g , 



(11) 

(12) 

(13) 
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with Dz = dz(27r) -1 / 2 exp(— z 2 /2) the Gaussian mea- 
sure. We remark that for zero bias we find back the 
results given in 



III. RESULTS 

Two input-output relations are studied for comparison. 
The piecewise linear one 



St Or) 



sgn(x) |z| > I/7 

\x\ < I/7 



(15) 



as a prototype of a general monotonic function, and the 
reversed-wedge |7|,|| 



9bw(x) = sgn[(x + l/i)x{x - 1/7)] 



(16) 



as an example of a non-monotonic one. Here 7 is called 
the gain parameter. In the numerical analysis we restrict 
ourselves mostly to e = 0. 



A. Correlated patterns 

Following {|, we study correlations in the input pat- 
terns that are positive and fall off with the distance be- 
tween the sites 



Cj 



exp 



-il 



(17) 



with L a typical length size. The parameter S is the cor- 
relation strength inside one input pattern and varies be- 
tween 0, corresponding to independent sites and 1 mean- 
ing that all spins in a pattern are equal. The spatial 
structure introduced above induces interesting correla- 
tions between the couplings. The latter are positively 
correlated when close enough and anticorrelated when 
further apart 

For the sign-function it has been shown Q] that the 
optimal capacity, a c , at K — remains 2 regardless of 
the inner structure of the inputs. Here we analyse in 
addition a c as a function of 7 for different correlation 
strengths S in the case of <7 L . The results are shown in 
Fig. fy. Increasing 7 or decreasing S, a c increases. In the 
limit 7 — > co, (7 L becomes the sign- function such that a c 
always approaches 2 because of the argument above. 




FIG. 1. The optimal capacity a c as a function of 7 for g L 
with correlated inputs. From top to bottom S — 0, .5, .9 

Numerically we have found that changing S can be seen 
as an effective scaling in 7. Since this scaling behavior 
can be shown also analytically for biased patterns we 
write it down in the next subsection. 

For the non-monotonic input-output relation g aw the 
corresponding results are presented in Fig. ^| Several 
remarks are in order. Technically, the solutions of the 
relevant saddle-point equations are unique for small val- 
ues of S for all 7 but from S > S c = 0.55 onwards there 
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exist multiple solutions in a growing interval in 7. This 
is due to the non-monotonicity of the input-output re- 
lation. Taking the solution giving the greatest optimal 
capacity we find the constant part a c — 2 (for small 7 
and S > S c = 0.55) of the curves in Fig. ||. It seems that 
for these values of 7, the perceptron is not able to benefit 
from the non-monotonicity of the input-output relation 
due to the fact that the order parameter Q remains small. 




FIG. 2. The optimal capacity a c as a function of 7 for g RW 
with correlated inputs. From left to right S = 0, .2, .7, .9 

It is interesting to see that the maximal value of a c is 
the same independent of the correlation strength mean- 
ing that also for the non-monotonic g RV/ the inner struc- 
ture of the patterns does not play any role in this respect. 
This precisely amounts to a scaling of 7. 



B. Biased patterns 

In this section we study biased input and output pat- 
terns. Their probability distribution is chosen to be 



p{x) 



1 



m „. „ 1 — to „ , . . . 

—6(l-x) + ——5(l+x) (18) 



where to can be different for input and output, thus defin- 
ing TOj and to . 

We start with some general properties of the percep- 
trons defined by eqs. ( |l5f|l6| ). Comparing the results ( |Io| ) 
with those of 0,M , we see that in order to obtain the ex- 
pressions for biased patterns it is sufficient to substitute 
the local field h by (h—miM)/y/v, and to perform an ex- 
tra maximization over M in the expressions for patterns 
without bias. This tells us that the order parameter M 
indicating the bias in the couplings, as seen in its defi- 
nition (||), shifts the local field such that the condition 
(|J) is optimally satisfied and hence that the capacity in- 
creases. Furthermore, it naturally introduces two cases: 
rriiM = and ra^M 7^ 0. Whenever rrij or too are zero 
rriiM = 0. However, rriiM = does not necessarily im- 
ply that TOj or to are zero as we will see explicitly in the 
case of the non-monotonic g RW - But these points where 
rriiM = occur rather exceptionally. 



A closer inspection of the results in section |l| shows 
that the graded-response perceptron satisfies the follow- 
ing analytic scaling behavior, given an output distribu- 
tion: 



(19) 



a c mi,i,m ;7V«i,e, 



a c m i)2 ,m ; 7V«2,e, 



where m^i and 771^2 are two values of the input bias and 
v\ and V2 two corresponding values of the variance. These 
results are valid for both monotonic and non-monotonic 
input-output relations. The new insight is that the out- 
put statistics is the important quantity determining the 
performance of the perceptron. In general, increasing the 
bias in the output results in a non-decreasing optimal ca- 
pacity. 

Concerning RS stability assured by a negative sign of 
the replicon eigenvalue Aj? || we know that for mono- 
tonic non-decreasing input-output relations and unbiased 
patterns the following identity holds (compare pf]|) 

sgn[-A_R,(mj, m = 0; 7, e, k = 0)] 



= sgn 



d 

— a c (mi,m = 0; 7, e, K = 0) 
07 



(20) 



Together with ( |I^ ) this relation tells us that varying the 
input bias does not change the breaking behavior for a 
fixed 7. The scaling (|l9| ) also implies 

sgn[-Afl(mi, to = 0; 7, e, k = 0)] 
d 

■ a c (mi,m = 0; 7, e, k = 0) 



sgn 



drrii 



(21) 



For non-monotonic transfer functions we know that 
replica-symmetry is unstable |llj . 

For the graded-reponse perceptron with monotonic 
input-output relation g L and pattern distribution ( |T^ ) we 
find the following additional results concerning the out- 
put bias. For M = the solution is stable for all values 
of the bias in input and output, for each 7. 




FIG. 3. The optimal capacity a c (full curves) and the repli 
con eigenvalue Xr (dashed curves) for g L as a function of 7 
with rtii = .2 and, from bottom to top m = 0, .5, .9. 
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For M 0, from a certain value of m onwards the 
RS solution becomes unstable for a growing interval of 
7- values. This is shown in Fig. ||. For these perceptrons 
it is known [^|Jl0[| that the effect of breaking is small. 
Although it grows with increasing output-bias, it is seen 
that for m < .9, the difference in capacity does not 
exceed 10 -2 . We remark that the maximum capacity as 
a function of 7 is reached for 7 — > 00 in agreement with 
the result obtained in ||. 

For the non-monotonic transferfunction g RW the max- 
imal a c is obtained for a finite value of 7, as shown in 
Fig. H implying that there exists an optimal choice for 
the width of the plateaus. This choice depends on the 
specific parameters of the pattern distribution. 




FIG. 4. The RS (dashed curve) and RSB (full curve) opti- 
mal capacity a c of g RW as a function of 7 with mi = .1 and, 
from bottom to top, m — 0, .5, .75, .8. 



these points is zero and that the input-output relation 
is odd in the local field, such that the output statistics 
does not effect the optimal capacity of the system. Since 
changing rrii can be expressed as rescaling 7, the capacity 
at these points is the same for every value of rrii and m . 

We end with the remark that changing the input dis- 
tribution fllq ) by varying the place of the delta peaks in 
the interval [0, 1] shows a similar scaling behavior. 

IV. CONCLUSIONS 

In this brief report we have studied the optimal ca- 
pacity of graded-response perceptrons storing biased and 
spatially correlated patterns with non-monotonic input- 
output relations using a first-step replica-symmetry 
breaking analysis. 

The most important results are that a change in the 
optimal capacity due to bias or correlations in the in- 
put can be removed by an appropriate scaling of the rel- 
evant parameters defining the graded-response percep- 
tron. The statistics of the outputs really determines the 
performance of the latter. 
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Compared with the monotonic case the overall differ- 
ence between the RS and RSB1 solution is much bigger. 
The optimal capacity for the non-monotonic perceptron 
is always greater than that of the monotonic one. We 
note that for 7 — > and 7 — > 00 the optimal capac- 
ity of g Rm approaches that of the sign-function. We re- 
mark that, as in the case of correlated input patterns, 
the order-parameter M gives rise to multiple solutions 
for small values of 7. We take the solution with the 
highest a c . 

A somewhat surprising feature of this perceptron is 
that a second maximum develops both in the RS and 
RSBI-solution as a function of 7 for big values of m 
(see Fig. |I]). Qualitatively speaking the overall behav- 
ior of the input-output relation remains the same within 
RSB1. This is the case for all values of the model pa- 
rameters we have considered but may be a property of 
the binary output distribution (compare g| for a uniform 
output). The difference between RS and RSB1 grows 
with increasing bias. 

Between the two maxima, there is a point where a c 
does not depend on the output rriQ. This feature is 
present both in the RS and the RSBl-approximation al- 
though for a slightly different value of 7 in RSB1. The 
underlying reason for this is that the solution of M at 
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